AITopics | question-only model

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Neural Information Processing SystemsMar-16-2026, 21:29:20 GMT

Modern Visual Question Answering (VQA) models have been shown to rely heavily on superficial correlations between question and answer words learned during training -- \eg overwhelmingly reporting the type of room as kitchen or the sport being played as tennis, irrespective of the image. Most alarmingly, this shortcoming is often not well reflected during evaluation because the same strong priors exist in test distributions; however, a VQA system that fails to ground questions in image content would likely perform poorly in real-world settings. In this work, we present a novel regularization scheme for VQA that reduces this effect. We introduce a question-only model that takes as input the question encoding from the VQA model and must leverage language biases in order to succeed. We then pose training as an adversarial game between the VQA model and this question-only adversary -- discouraging the VQA model from capturing language biases in its question encoding.Further, we leverage this question-only model to estimate the mutual information between the image and answer given the question, which we maximize explicitly to encourage visual grounding. Our approach is a model agnostic training procedure and simple to implement. We show empirically that it can improve performance significantly on a bias-sensitive split of the VQA dataset for multiple base models -- achieving state-of-the-art on this task. Further, on standard VQA tasks, our approach shows significantly less drop in accuracy compared to existing bias-reducing VQA models.

artificial intelligence, name change, proceedings, (4 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment (0.59)

Technology: Information Technology > Artificial Intelligence (0.79)

Add feedback

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Sainandan Ramakrishnan, Aishwarya Agrawal, Stefan Lee

Neural Information Processing SystemsFeb-13-2026, 00:56:33 GMT

Neural Information Processing Systems http://nips.cc/

question-only adversary, question-only model, vqa model, (14 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.72)

Add feedback

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Neural Information Processing SystemsNov-20-2025, 22:21:56 GMT

Modern Visual Question Answering (VQA) models have been shown to rely heavily on superficial correlations between question and answer words learned during training -- \eg overwhelmingly reporting the type of room as kitchen or the sport being played as tennis, irrespective of the image. Most alarmingly, this shortcoming is often not well reflected during evaluation because the same strong priors exist in test distributions; however, a VQA system that fails to ground questions in image content would likely perform poorly in real-world settings. In this work, we present a novel regularization scheme for VQA that reduces this effect. We introduce a question-only model that takes as input the question encoding from the VQA model and must leverage language biases in order to succeed. We then pose training as an adversarial game between the VQA model and this question-only adversary -- discouraging the VQA model from capturing language biases in its question encoding.Further, we leverage this question-only model to estimate the mutual information between the image and answer given the question, which we maximize explicitly to encourage visual grounding. Our approach is a model agnostic training procedure and simple to implement. We show empirically that it can improve performance significantly on a bias-sensitive split of the VQA dataset for multiple base models -- achieving state-of-the-art on this task. Further, on standard VQA tasks, our approach shows significantly less drop in accuracy compared to existing bias-reducing VQA models.

adversarial regularization, name change, vqa model, (3 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment (0.59)

Technology: Information Technology > Artificial Intelligence (0.79)

Add feedback

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Sainandan Ramakrishnan, Aishwarya Agrawal, Stefan Lee

Neural Information Processing SystemsNov-20-2025, 17:03:01 GMT

Modern Visual Question Answering (VQA) models have been shown to rely heavily on superficial correlations between question and answer words learned during training - e.g .

machine learning, natural language, question answering, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.72)

Add feedback

Reviews: Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Neural Information Processing SystemsOct-7-2024, 11:47:50 GMT

This paper studies the problem of handling the langauge/text pariors in the task visual question answering. The great performance achieved by many state-of-the-art VQA systems are accomplished by heavily learning a better question encoding to better capture the correlations between the questions and answers, but ignore the image information. So the problem is important to the VQA research community. In general, the paper is well-written and easy to follow. And some concerns and sugggestions can be found as the following: 1) The major concern is the basic intuition of the question-only adversary: The question encoding q_i from the question encoder is not necessarily the same bias that lead the VQA model f to ignore the visual content. Since f can be a deep neutral network, for example, deep RNN or deep RNN-CNN to leverage both the question embedding and visual embedding, thus the non-linearity in f would make the question embedding as a image-aware represention to generate the answer distribution.

adversarial regularization, information, question-only model, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.62)

Add feedback

Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances

Wu, Yike, Zhao, Yu, Zhao, Shiwan, Zhang, Ying, Yuan, Xiaojie, Zhao, Guoqing, Jiang, Ning

arXiv.org Artificial IntelligenceSep-18-2022

Despite the great progress of Visual Question Answering (VQA), current VQA models heavily rely on the superficial correlation between the question type and its corresponding frequent answers (i.e., language priors) to make predictions, without really understanding the input. In this work, we define the training instances with the same question type but different answers as \textit{superficially similar instances}, and attribute the language priors to the confusion of VQA model on such instances. To solve this problem, we propose a novel training framework that explicitly encourages the VQA model to distinguish between the superficially similar instances. Specifically, for each training instance, we first construct a set that contains its superficially similar counterparts. Then we exploit the proposed distinguishing module to increase the distance between the instance and its counterparts in the answer space. In this way, the VQA model is forced to further focus on the other parts of the input beyond the question type, which helps to overcome the language priors. Experimental results show that our method achieves the state-of-the-art performance on VQA-CP v2. Codes are available at \href{https://github.com/wyk-nku/Distinguishing-VQA.git}{Distinguishing-VQA}.

artificial intelligence, machine learning, question type, (19 more...)

arXiv.org Artificial Intelligence

2209.08529

Country: Asia > China > Tianjin Province > Tianjin (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Ramakrishnan, Sainandan, Agrawal, Aishwarya, Lee, Stefan

Neural Information Processing SystemsFeb-14-2020, 08:13:50 GMT

Modern Visual Question Answering (VQA) models have been shown to rely heavily on superficial correlations between question and answer words learned during training -- \eg overwhelmingly reporting the type of room as kitchen or the sport being played as tennis, irrespective of the image. Most alarmingly, this shortcoming is often not well reflected during evaluation because the same strong priors exist in test distributions; however, a VQA system that fails to ground questions in image content would likely perform poorly in real-world settings. In this work, we present a novel regularization scheme for VQA that reduces this effect. We introduce a question-only model that takes as input the question encoding from the VQA model and must leverage language biases in order to succeed. We then pose training as an adversarial game between the VQA model and this question-only adversary -- discouraging the VQA model from capturing language biases in its question encoding.Further, we leverage this question-only model to estimate the mutual information between the image and answer given the question, which we maximize explicitly to encourage visual grounding.

adversarial regularization, question-only model, vqa model, (1 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment (0.61)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.64)

Add feedback

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Ramakrishnan, Sainandan, Agrawal, Aishwarya, Lee, Stefan

Neural Information Processing SystemsDec-31-2018

Modern Visual Question Answering (VQA) models have been shown to rely heavily on superficial correlations between question and answer words learned during training -- \eg overwhelmingly reporting the type of room as kitchen or the sport being played as tennis, irrespective of the image. Most alarmingly, this shortcoming is often not well reflected during evaluation because the same strong priors exist in test distributions; however, a VQA system that fails to ground questions in image content would likely perform poorly in real-world settings. In this work, we present a novel regularization scheme for VQA that reduces this effect. We introduce a question-only model that takes as input the question encoding from the VQA model and must leverage language biases in order to succeed. We then pose training as an adversarial game between the VQA model and this question-only adversary -- discouraging the VQA model from capturing language biases in its question encoding.Further, we leverage this question-only model to estimate the mutual information between the image and answer given the question, which we maximize explicitly to encourage visual grounding. Our approach is a model agnostic training procedure and simple to implement. We show empirically that it can improve performance significantly on a bias-sensitive split of the VQA dataset for multiple base models -- achieving state-of-the-art on this task. Further, on standard VQA tasks, our approach shows significantly less drop in accuracy compared to existing bias-reducing VQA models.

machine learning, natural language, question answering, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Sports (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.72)

Add feedback

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Ramakrishnan, Sainandan, Agrawal, Aishwarya, Lee, Stefan

Neural Information Processing SystemsDec-31-2018

Modern Visual Question Answering (VQA) models have been shown to rely heavily on superficial correlations between question and answer words learned during training -- \eg overwhelmingly reporting the type of room as kitchen or the sport being played as tennis, irrespective of the image. Most alarmingly, this shortcoming is often not well reflected during evaluation because the same strong priors exist in test distributions; however, a VQA system that fails to ground questions in image content would likely perform poorly in real-world settings. In this work, we present a novel regularization scheme for VQA that reduces this effect. We introduce a question-only model that takes as input the question encoding from the VQA model and must leverage language biases in order to succeed. We then pose training as an adversarial game between the VQA model and this question-only adversary -- discouraging the VQA model from capturing language biases in its question encoding.Further, we leverage this question-only model to estimate the mutual information between the image and answer given the question, which we maximize explicitly to encourage visual grounding. Our approach is a model agnostic training procedure and simple to implement. We show empirically that it can improve performance significantly on a bias-sensitive split of the VQA dataset for multiple base models -- achieving state-of-the-art on this task. Further, on standard VQA tasks, our approach shows significantly less drop in accuracy compared to existing bias-reducing VQA models.

machine learning, natural language, question answering, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Sports (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.72)

Add feedback

Filters

Collaborating Authors

question-only model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Reviews: Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization